Boost K-Means
نویسندگان
چکیده
Due to its simplicity and versatility, k-means remains popular since it was proposed three decades ago. Since then, continuous efforts have been taken to enhance its performance. Unfortunately, a good trade-off between quality and efficiency is hardly reached. In this paper, a novel k-means variant is presented. Different from most of k-means variants, the clustering procedure is explicitly driven by an objective function, which is feasible for the whole l2-space. The classic egg-chicken loop in k-means has been simplified to a pure stochastic optimization procedure. K-means therefore becomes simpler, faster and better. The effectiveness of this new variant has been studied extensively in different contexts, such as document clustering, nearest neighbor search and image clustering. Superior performance is observed across different scenarios.
منابع مشابه
An Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification
In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...
متن کاملAn Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification
In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...
متن کاملK-Boost: A Scalable Algorithm for High-Quality Clustering of Microarray Gene Expression Data
Microarray technology for profiling gene expression levels is a popular tool in modern biological research. Applications range from tissue classification to the detection of metabolic networks, from drug discovery to time-critical personalized medicine. Given the increase in size and complexity of the data sets produced, their analysis is becoming problematic in terms of time/quality trade-offs...
متن کاملA clustering method based on boosting
It is widely recognized that the boosting methodology provides superior results for classification problems. In this paper, we propose the boost-clustering algorithm which constitutes a novel clustering methodology that exploits the general principles of boosting in order to provide a consistent partitioning of a dataset. The boost-clustering algorithm is a multi-clustering method. At each boos...
متن کاملComplex Scene Analysis in Urban Areas Based on an Ensemble Clustering Method Applied on Lidar Data
3D object extraction is one of the main interests and has lots of applications in photogrammetry and computer vision. In recent years, airborne laser-scanning has been accepted as an effective 3D data collection technique for extracting spatial object models such as digital terrain models (DTM) and building models. Data clustering, also known as unsupervised learning is one of the key technique...
متن کاملBoosting the Performances of the Recurrent Neural Network by the Fuzzy Min-Max
The k-means training algorithm used for the RBF (Radial Basis Function) neural network can have some weakness like empty clusters, the choice of the cluster number and the random choice of the centers of theses clusters. In this paper, we use the Fuzzy Min Max technique to boost the performances of the training algorithm. This technique is used to determine the number of the k centers and to in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1610.02483 شماره
صفحات -
تاریخ انتشار 2016